Skip to content

feat(isolation): add configurable worktree creation timeout#1029

Closed
norbinsh wants to merge 2 commits intocoleam00:devfrom
norbinsh:archon/task-feat-configurable-worktree-timeout
Closed

feat(isolation): add configurable worktree creation timeout#1029
norbinsh wants to merge 2 commits intocoleam00:devfrom
norbinsh:archon/task-feat-configurable-worktree-timeout

Conversation

@norbinsh
Copy link
Copy Markdown
Contributor

@norbinsh norbinsh commented Apr 10, 2026

Summary

  • Problem: Worktree creation uses a hardcoded 30s timeout, causing failures for repos with heavy post-checkout hooks.
  • Why it matters: Users with complex post-checkout hooks (linting, dependency install, etc.) can't use worktree isolation without the creation timing out.
  • What changed: Added worktree.timeout config option (ms) that flows through the isolation layer to all worktree creation execFileAsync calls, with 30000ms default for backward compatibility.
  • What did not change (scope boundary): Removal, listing, and branch cleanup operations keep their hardcoded timeouts. No changes to @archon/git package, MergedConfig, or GlobalConfig.

UX Journey

Before

  User                     .archon/config.yaml         WorktreeProvider
  ────                     ───────────────────         ────────────────
  triggers workflow ─────▶ worktree.baseBranch
                           worktree.copyFiles
                           (no timeout option)  ─────▶ hardcoded 30000ms
                                                       ↳ TIMEOUT if hooks > 30s ❌

After

  User                     .archon/config.yaml         WorktreeProvider
  ────                     ───────────────────         ────────────────
  triggers workflow ─────▶ worktree.baseBranch
                           worktree.copyFiles
                          *worktree.timeout: 60000* ─▶ uses config timeout (or 30000ms default)
                                                       ↳ succeeds with heavy hooks ✅

Architecture Diagram

Before

RepoConfig.worktree ──▶ WorktreeCreateConfig ──▶ WorktreeProvider
  baseBranch                baseBranch              hardcoded 30000ms
  copyFiles                 copyFiles

After

RepoConfig.worktree ──▶ WorktreeCreateConfig ──▶ WorktreeProvider
  baseBranch                baseBranch              config.timeout ?? 30000
  copyFiles                 copyFiles
  [~] timeout               [~] timeout

Connection inventory:

From To Status Notes
RepoConfig.worktree WorktreeCreateConfig modified Added timeout?: number
WorktreeCreateConfig WorktreeProvider.create modified Timeout threaded to creation methods
WorktreeProviderexecFileAsync (creation) N/A modified 11 instances use config timeout
WorktreeProviderexecFileAsync (removal/list) N/A unchanged Keep hardcoded timeouts

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: isolation, config
  • Module: isolation:worktree-provider, core:config-types

Change Metadata

  • Change type: feature
  • Primary scope: isolation

Linked Issue

N/A — Feature request from workflow usage observation.

Validation Evidence (required)

bun run validate  # ✅ All pass
  • Type check: ✅ (9 packages)
  • Lint: ✅ (0 errors, 0 warnings)
  • Format: ✅
  • Tests: ✅ (all packages, 2901+ tests)
  • Build: ⚠️ Pre-existing @archon/web failure (missing mdast-util-gfm dep, identical on dev)

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Compatibility / Migration

  • Backward compatible? Yes — default remains 30000ms when not configured
  • Config/env changes? Yes — new optional worktree.timeout in .archon/config.yaml
  • Database migration needed? No

Human Verification (required)

  • Verified scenarios: bun run validate passes, diff reviewed for all 15 timeout instances (11 updated, 4 kept)
  • Edge cases checked: Omitting config (default 30000ms), config injection path already passes full worktree object
  • What was not verified: Manual test with an actual heavy post-checkout hook repo

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Only worktree creation in @archon/isolation
  • Potential unintended effects: None — pure config plumbing with default fallback
  • Guardrails/monitoring for early detection: Existing isolation tests cover all creation paths

Rollback Plan (required)

  • Fast rollback command/path: Revert the single commit
  • Feature flags or config toggles: N/A (opt-in via config, default unchanged)
  • Observable failure symptoms: Worktree creation timeouts (same as before this PR)

Risks and Mitigations

  • Risk: User sets unreasonably large timeout, causing long hangs on broken repos
    • Mitigation: This is user-controlled config; no worse than current hardcoded behavior. User can always Ctrl+C.

Summary by CodeRabbit

  • New Features
    • Added a configurable timeout for worktree creation operations. Users can set the timeout (milliseconds) for git steps during worktree setup; default is 30,000 ms.
    • Timeout is applied to PR and non-PR worktree creation and validates as a positive number.
    • Branch-deletion retry behavior continues to use the fixed 30,000 ms timeout.

Add `worktree.timeout` option (ms) to `.archon/config.yaml` that flows
through the isolation layer to worktree creation execFileAsync calls,
replacing the hardcoded 30000ms timeout. Default remains 30000ms for
backward compatibility. Fixes repos with heavy post-checkout hooks that
exceed 30s.

- Add `timeout?: number` to RepoConfig.worktree and WorktreeCreateConfig
- Thread timeout through WorktreeProvider creation methods (11 instances)
- Keep hardcoded timeouts for removal, listing, and branch cleanup ops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

This change makes the git worktree operation timeout configurable. A new optional timeout?: number (milliseconds, default 30000) was added to worktree configuration types and is propagated through worktree creation paths so git command invocations use the configured timeout instead of a hardcoded 30000ms.

Changes

Cohort / File(s) Summary
Config Types
packages/core/src/config/config-types.ts, packages/isolation/src/types.ts
Added optional timeout?: number to RepoConfig.worktree and WorktreeCreateConfig with documented default 30000ms.
Worktree Provider
packages/isolation/src/providers/worktree.ts
Derive timeout from config (validated positive number, fallback 30000) and pass it into PR and branch worktree creation helpers; replaced hardcoded { timeout: 30000 } on git exec calls with the provided timeout. Note: the git branch -D retry still uses a hardcoded 30000ms.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble the config, a timeout in hand,
I thread it through branches across this land.
Thirty seconds to start, or longer if planned,
Worktrees now listen — from burrow to strand. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(isolation): add configurable worktree creation timeout' accurately and concisely describes the main change—adding a new configurable timeout option for worktree creation in the isolation layer.
Description check ✅ Passed The PR description comprehensively covers all major template sections: Summary with problem/why/what changed/scope boundary, detailed UX Journey before/after, Architecture Diagram with connection inventory, proper Labels, Change Metadata, Validation Evidence with bun run validate results, Security Impact assessment, Compatibility/Migration details, Human Verification notes, Side Effects analysis, and Rollback Plan. All required sections are present and substantive.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/isolation/src/providers/worktree.ts`:
- Around line 559-560: Validate the user-provided worktree timeout before using
it: check worktreeConfig?.timeout (the value assigned to the local variable
timeout) is a finite positive integer within an acceptable bound (e.g., >0 and
<= some safe max like 5 minutes), throw a clear error if it is missing/invalid,
and use the sanitized value (e.g., validatedTimeout) when building process
options; update the validation in the worktree creation flow in this file
(around the const timeout = worktreeConfig?.timeout ?? 30000 line) so invalid
YAML-cast values are rejected early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dad1e612-6b56-475c-be9a-0a87e17e431c

📥 Commits

Reviewing files that changed from the base of the PR and between 95679fa and d951863.

📒 Files selected for processing (3)
  • packages/core/src/config/config-types.ts
  • packages/isolation/src/providers/worktree.ts
  • packages/isolation/src/types.ts

Comment thread packages/isolation/src/providers/worktree.ts Outdated
Guard against invalid YAML-cast values (e.g., strings, negatives, NaN)
by checking the type before using the config timeout. Falls back to
30000ms default for any nonsensical value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/isolation/src/providers/worktree.ts (1)

559-562: ⚠️ Potential issue | 🟡 Minor

Clamp and finite-check worktree.timeout before using it in process options.

This check still accepts Infinity and unbounded large numbers. Please sanitize to a finite positive integer and cap to a safe upper bound before passing it to git calls.

🔧 Proposed hardening
-    const timeout =
-      typeof worktreeConfig?.timeout === 'number' && worktreeConfig.timeout > 0
-        ? worktreeConfig.timeout
-        : 30000;
+    const configuredTimeout = worktreeConfig?.timeout;
+    const timeout =
+      typeof configuredTimeout === 'number' &&
+      Number.isFinite(configuredTimeout) &&
+      configuredTimeout > 0 &&
+      configuredTimeout <= 2_147_483_647
+        ? Math.floor(configuredTimeout)
+        : 30000;
In Node.js child_process.execFile options, what are the valid bounds/behavior for `timeout`, specifically for `Infinity` and values greater than 2^31-1 milliseconds?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/isolation/src/providers/worktree.ts` around lines 559 - 562, The
computed timeout value from worktreeConfig?.timeout may be Infinity or exceed
platform limits; change the logic around the timeout variable so you first
verify Number.isFinite(worktreeConfig.timeout) and that it is > 0, coerce to an
integer (e.g., Math.floor), then clamp it to a safe upper bound such as
MAX_TIMEOUT = 2**31 - 1 (2147483647) before using timeout in process options for
git calls; update references to timeout (the const defined from
worktreeConfig?.timeout) to use this sanitized value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/isolation/src/providers/worktree.ts`:
- Around line 559-562: The computed timeout value from worktreeConfig?.timeout
may be Infinity or exceed platform limits; change the logic around the timeout
variable so you first verify Number.isFinite(worktreeConfig.timeout) and that it
is > 0, coerce to an integer (e.g., Math.floor), then clamp it to a safe upper
bound such as MAX_TIMEOUT = 2**31 - 1 (2147483647) before using timeout in
process options for git calls; update references to timeout (the const defined
from worktreeConfig?.timeout) to use this sanitized value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a203ea52-7480-4aad-850a-077746e814c6

📥 Commits

Reviewing files that changed from the base of the PR and between d951863 and 25dc450.

📒 Files selected for processing (1)
  • packages/isolation/src/providers/worktree.ts

tasty007 pushed a commit to tasty007/Archon that referenced this pull request Apr 10, 2026
…nd subprocesses (coleam00#1030)

* Fix: prevent target repo .env leakage into Claude subprocess (coleam00#1029)

Bun auto-loads CWD .env before user code runs. When archon runs from a
target repo, that repo's ANTHROPIC_API_KEY (and other secrets) leaked into
process.env and were passed through to the Claude Code subprocess, billing
the wrong account.

Changes:
- Add SUBPROCESS_ENV_ALLOWLIST + buildCleanSubprocessEnv() utility
- claude.ts buildSubprocessEnv now starts from allowlist, not process.env
- CLI and server entry points strip all keys parsed from CWD .env on startup
  (replaces DATABASE_URL-only patch)
- Update claude.test.ts to assert ANTHROPIC_API_KEY no longer reaches subprocess
- Add env-allowlist tests

Fixes coleam00#1029

* fix: address review findings for env allowlist PR

- Register env-allowlist.test.ts in @archon/core test batch so security tests run in CI
- Remove CLAUDECODE, NODE_OPTIONS, VSCODE_INSPECTOR_OPTIONS from allowlist (they are always stripped by caller — listing them was semantically contradictory)
- Add intent comment to silent dotenv parse fallback in cli.ts and server/index.ts
- Add test for useGlobalAuth=false path verifying ANTHROPIC_API_KEY excluded from subprocess env
- Update security.md to document full CWD .env isolation and subprocess allowlist (was only describing DATABASE_URL)
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
…nd subprocesses (coleam00#1030)

* Fix: prevent target repo .env leakage into Claude subprocess (coleam00#1029)

Bun auto-loads CWD .env before user code runs. When archon runs from a
target repo, that repo's ANTHROPIC_API_KEY (and other secrets) leaked into
process.env and were passed through to the Claude Code subprocess, billing
the wrong account.

Changes:
- Add SUBPROCESS_ENV_ALLOWLIST + buildCleanSubprocessEnv() utility
- claude.ts buildSubprocessEnv now starts from allowlist, not process.env
- CLI and server entry points strip all keys parsed from CWD .env on startup
  (replaces DATABASE_URL-only patch)
- Update claude.test.ts to assert ANTHROPIC_API_KEY no longer reaches subprocess
- Add env-allowlist tests

Fixes coleam00#1029

* fix: address review findings for env allowlist PR

- Register env-allowlist.test.ts in @archon/core test batch so security tests run in CI
- Remove CLAUDECODE, NODE_OPTIONS, VSCODE_INSPECTOR_OPTIONS from allowlist (they are always stripped by caller — listing them was semantically contradictory)
- Add intent comment to silent dotenv parse fallback in cli.ts and server/index.ts
- Add test for useGlobalAuth=false path verifying ANTHROPIC_API_KEY excluded from subprocess env
- Update security.md to document full CWD .env isolation and subprocess allowlist (was only describing DATABASE_URL)
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 17, 2026

This PR appears to fully address #1119. Consider adding Closes #1119 to the PR body so the issue auto-closes on merge.

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 20, 2026

Thanks for the careful scoping and the detailed writeup, @norbinsh — the underlying problem (30s is too tight when repos have heavy post-checkout hooks) is real, and your PR body made it easy to evaluate.

After thinking about it though, I don't think a new config key is the right primitive here. A few reasons:

  1. Config bloat cost. .archon/config.yaml keys carry permanent maintenance + documentation weight. We try to avoid adding them without multiple concrete user reports (per the YAGNI guidance in CLAUDE.md).
  2. Half-measure. The same timeout: 30000 pattern appears in ~15 sites across worktree.ts — this PR makes 11 configurable and leaves 4 hardcoded (removal, branch-delete, listing). Either they all share one knob or none do.
  3. Cheaper fix exists. Just bumping the default from 30s to something generous (e.g., 5 min) covers the reported case without any new API surface. Genuine hangs (credential prompts in non-TTY, network stalls) still get caught by the ceiling; heavy post-checkout hooks stop failing.

I'm going to close this in favor of #XXXX, which consolidates all 15 sites onto a single GIT_OPERATION_TIMEOUT_MS = 300_000 constant. If you later hit a case where 5 min is also too tight, that's a strong signal we should revisit adding the config key — but let's cross that bridge if it comes up.

Really appreciate you surfacing the problem and writing up #1119. Crediting you in the replacement PR.

Wirasm added a commit that referenced this pull request Apr 20, 2026
All 15 worktree git-subprocess timeouts in WorktreeProvider were hardcoded
at 30000ms. Repos with heavy post-checkout hooks (lint, dependency install,
submodule init) routinely exceed that budget and fail worktree creation.

Consolidate them onto a single GIT_OPERATION_TIMEOUT_MS constant at 5 min.
Generous enough to cover reported cases while still catching genuine hangs
(credential prompts in non-TTY, stalled fetches).

Chosen over the config-key approach in #1029 to avoid adding permanent
.archon/config.yaml surface for a problem a raised default solves cleanly.
If 5 min turns out to also be too tight for real-world use, we'll revisit.

Closes #1119
Supersedes #1029

Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com>
@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 20, 2026

Opened the replacement: #1306. Closing this in its favor. Thanks again @norbinsh — credited you in the commit and PR body.

@Wirasm Wirasm closed this Apr 20, 2026
Wirasm added a commit that referenced this pull request Apr 20, 2026
All 15 worktree git-subprocess timeouts in WorktreeProvider were hardcoded
at 30000ms. Repos with heavy post-checkout hooks (lint, dependency install,
submodule init) routinely exceed that budget and fail worktree creation.

Consolidate them onto a single GIT_OPERATION_TIMEOUT_MS constant at 5 min.
Generous enough to cover reported cases while still catching genuine hangs
(credential prompts in non-TTY, stalled fetches).

Chosen over the config-key approach in #1029 to avoid adding permanent
.archon/config.yaml surface for a problem a raised default solves cleanly.
If 5 min turns out to also be too tight for real-world use, we'll revisit.

Closes #1119
Supersedes #1029

Co-authored-by: Shay Elmualem <12733941+norbinsh@users.noreply.github.com>
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…nd subprocesses (coleam00#1030)

* Fix: prevent target repo .env leakage into Claude subprocess (coleam00#1029)

Bun auto-loads CWD .env before user code runs. When archon runs from a
target repo, that repo's ANTHROPIC_API_KEY (and other secrets) leaked into
process.env and were passed through to the Claude Code subprocess, billing
the wrong account.

Changes:
- Add SUBPROCESS_ENV_ALLOWLIST + buildCleanSubprocessEnv() utility
- claude.ts buildSubprocessEnv now starts from allowlist, not process.env
- CLI and server entry points strip all keys parsed from CWD .env on startup
  (replaces DATABASE_URL-only patch)
- Update claude.test.ts to assert ANTHROPIC_API_KEY no longer reaches subprocess
- Add env-allowlist tests

Fixes coleam00#1029

* fix: address review findings for env allowlist PR

- Register env-allowlist.test.ts in @archon/core test batch so security tests run in CI
- Remove CLAUDECODE, NODE_OPTIONS, VSCODE_INSPECTOR_OPTIONS from allowlist (they are always stripped by caller — listing them was semantically contradictory)
- Add intent comment to silent dotenv parse fallback in cli.ts and server/index.ts
- Add test for useGlobalAuth=false path verifying ANTHROPIC_API_KEY excluded from subprocess env
- Update security.md to document full CWD .env isolation and subprocess allowlist (was only describing DATABASE_URL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants